A Chart Parser to Analyze Large Medical Corpora
نویسندگان
چکیده
In this paper we describe a natural language parser for large medical corpora. Sentence parsing is a necessary step to build a sentence representation and to support a wide-coverage semantic interpretation. When applied to limited domains, a good syntax coverage can be obtained from Phrase-Structure rules. Large medical corpora show a strong variability in word and phrase order that requires more specific parsing strategies. We describe a parser based on Chart techniques. It parses constituents from left to right as they appear in a sentence. It enables the incremental partial parsing of words and phrases coming from a speech recognition input. We report here first results obtained from a large corpus of cancer treatment reports.
منابع مشابه
Finding Syntactic Structure in Unparsed Corpora The Gsearch Corpus Query System
The Gsearch system allows the selection of sentences by syntactic criteria from text corpora, even when these corpora contain no prior syntactic markup. This is achieved by means of a fast chart parser, which takes as input a grammar and a search expression specified by the user. Gsearch features a modular architecture that can be extended straightforwardly to give access to new corpora. The Gs...
متن کاملProbabilistic Modelling of Island-Driven Parsing
Two methods for stochastically modelling bidirectionality in chart parsing are presented. A probabilistic islanddriven parser which uses such models (either isolated or in combination) has been built and tested on wide-coverage corpora. The best results are accomplished by the hybrid approaches that combine both methods.
متن کاملText recognition enhancement with a probabilistic lattice chart parser
A probabilistic lattice chart parser is proposed for improving the performance of a text recognition technique. Digital images of words are recognized and alternatives for the identity of each are generated. Local word collocation statistics and a probabilistic chart parsing algorithm are used to determine the top N best parses for each sentence using the alternatives provided for the identity ...
متن کاملHead-Driven Parsing for Word Lattices
We present the first application of the head-driven statistical parsing model of Collins (1999) as a simultaneous language model and parser for largevocabulary speech recognition. The model is adapted to an online left to right chart-parser for word lattices, integrating acoustic, n-gram, and parser probabilities. The parser uses structural and lexical dependencies not considered by ngram model...
متن کاملGeneralized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...
متن کامل